|
The iterative proportional fitting procedure (IPFP, also known as biproportional fitting in statistics, RAS algorithm in economics and matrix raking or matrix scaling in computer science) is an iterative algorithm for estimating cell values of a contingency table such that the marginal totals remain fixed and the estimated table decomposes into an outer product. First introduced by Deming and Stephan in 1940 (they proposed IPFP as an algorithm leading to a minimizer of the Pearson X-squared statistic, which it ''does not'', and even failed to prove convergence), it has seen various extensions and related research. A rigorous proof of convergence by means of differential geometry is due to Fienberg (1970). He interpreted the family of contingency tables of constant crossproduct ratios as a particular (''IJ'' − 1)-dimensional manifold of constant interaction and showed that the IPFP is a fixed-point iteration on that manifold. Nevertheless, he assumed strictly positive observations. Generalization to tables with zero entries is still considered a hard and only partly solved problem. An exhaustive treatment of the algorithm and its mathematical foundations can be found in the book of Bishop et al. (1975). The first general proof of convergence, built on non-trivial measure theoretic theorems and entropy minimization, is due to Csiszár (1975). Relatively new results on convergence and error behavior have been published by Pukelsheim and Simeone (2009) . They proved simple necessary and sufficient conditions for the convergence of the IPFP for arbitrary two-way tables (i.e. tables with zero entries) by analysing an -error function. Other general algorithms can be modified to yield the same limit as the IPFP, for instance the Newton–Raphson method and the EM algorithm. In most cases, IPFP is preferred due to its computational speed, numerical stability and algebraic simplicity. == Algorithm 1 (classical IPFP) == Given a two-way (''I'' × ''J'')-table of counts , where the cell values are assumed to be Poisson or multinomially distributed, we wish to estimate a decomposition for all ''i'' and ''j'' such that is the maximum likelihood estimate (MLE) of the expected values leaving the marginals and fixed. The assumption that the table factorizes in such a manner is known as the ''model of independence'' (I-model). Written in terms of a log-linear model, we can write this assumption as , where , and the interaction term vanishes, that is for all ''i'' and ''j''. Choose initial values (different choices of initial values may lead to changes in convergence behavior), and for set : : Notes: * Convergence does not depend on the actual distribution. Distributional assumptions are necessary for inferring that the limit is an MLE indeed. * IPFP can be manipulated to generate any positive marginals be replacing by the desired row marginal (analogously for the column marginals). * IPFP can be extended to fit the ''model of quasi-independence'' (Q-model), where is known a priori for . Only the initial values have to be changed: Set if and 1 otherwise. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Iterative proportional fitting」の詳細全文を読む スポンサード リンク
|